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Abstract 

We present a Perfect Sampling algorithm that can be applied to the Master Equation of 
Gene Regulatory Networks (GRNs). The method recasts Gillespie's Stochastic Simulation 
Algorithm (SSA) in the light of Markov Chain Monte Carlo methods and combines it with 
the Dominated Coupling From The Past (DCFTP) algorithm to provide guaranteed sampling 
from the stationary distribution. We show how the DCFTP-SSA can be generically applied to 
genetic networks with feedback formed by the interconnection of linear enzymatic reactions 
and nonlinear Monod- and Hill-type elements. We establish rigorous bounds on the error 
and convergence of the DCFTP-SSA, as compared to the standard SSA, through a set of 
increasingly complex examples. Once the building blocks for GRNs have been introduced, 
the algorithm is applied to study properly averaged dynamic properties of two experimentally 
relevant genetic networks: the toggle switch, a two-dimensional bistable system, and the 
repressilator, a six-dimensional genetic oscillator. 
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Introduction 

The recent interest in stochastic models of chemical reactions has been largely motivated by exper- 
iments that have demonstrated the stochastic nature of key processes in the cell, such as signalling 
or gene regulation 0, H, 0, M, H, 0, H, S The inherent stochasticity of these biochemical 
processes is due to the low number of molecules involved in the reaction s dill) , amplified in some 
cases by the proximity to a critical point of the dynamics of the system (|12l ). Low copy numbers 



lead to intrinsic fluctuations and to the breakdown of models based on differential equations (|13l ) . 
A number of stochastic models of gene reg ulation |lll, Q E 0, E3, El 0, M, HII) and signalling 



(22, 23, 24) have been formulated and analyzed numerically with the aim of identifying the sources 
of randomness, and how noise is controlled and harnessed in the cellular environment. 

The starting point for such stochastic descriptions is the Master Equation (ME), a conservation 
equation that gives the time evolution of the probability distribution of the state of the system. 
For a discrete state space, it can be written as fl3h : 

P^t) = J2 w * p M -WtjPM (i) 

i 

where Pj(t) is the probability of finding the system in state j at time t and Wjt is the transition 
rate from state i to state j. The application of the ME to chemical systems goes back at least to 
the study of an irreversible reaction by Delbriick |25h. This work has been extended by a number 



of authors (see McQuarrie (|26f ) and Paulsson (|27l ) for reviews) . Eq. [T] is usually referred to as the 



Chemical Master Equation (CME) when it describes stochastic versions of Law of Mass Action 
systems. In this paper, we extend the notation and apply the term CME to compound reaction 
rates (e.g., Michaelis-Menten) as well. 

Although theoretically rigorous, there are very few systems for which the ME has been solved 
analytically. Direct numerical integration is often very difficult due to the fact that the state 
space grows very rapidly and also to the stiffness of the equations. One way to overcome these 
issues is to employ some kind of approximation scheme. In certain limits, one can consider contin- 
uum approximations leading to partial differential equations, such as van Kampen's Linear Noise 



Approximation and Fokker-Planck equations (|13f ). However, these approximations disregard the 



discreteness of the state of the system, and can therefore give rise to significant deviations when 
the number of molecules is very small, as can be the case for GRNs. 

A different approach for dealing with the CME numerically is provided by the widely used 
Stochastic Simulation Algorithm (SSA) by Gillespie ([H). The SSA is a kinetic Monte Carlo 
algorithm, rigorously derived from the same assumptions as the CME, which gives realizations of 
the trajectory of a given CME. The SSA follows an idea that can be traced back to Doob (f29h 
and provides an exact procedure for the kinetic sampling of the CME in the sense that 
we obtain an unbiased and convergent estimate of the solution of the CME. Due to its biological 
applications, there has been considerable interest in the SSA in recent years and several extensions 
have been proposed (e.g., the explicit introduction of space (31, 32) and time-delays (0)), as well 
as algorithmic speed-up improvements Il33t|34 




In many experiments of interest la HH3) , and in related theoretical analyses ((ill , [lil [l6l . 
18, 12, 24, 35, H), the system at hand is assumed to have reached the stationary distribution, i.e., 



the left hand side of Eq. Q]is zero. Consider the ME (Eq. [T]) in operator form: 

P = WP diag(e T WOP = Q T P, (2) 

where P is a (possibly infinite-dimensional) column vector of probabilities, e is the vector of 
ones, and W is the transition matrix. The stationary distribution P* (usually denoted tt in 



Markov theory) could in principle be obtained from the first left eigenvector of the Q-matrix (|37l ). 
However, this approach is impractical due to the curse of dimensionality: the run-time and memory 
requirements scale exponentially with the number of types of molecules. For small state spaces, 
one can still consider a (truncated) finite subspace to obtain an approximate convergent solution. 
Our own accurate version of this approximate eigenvector method is used in this paper to check 
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the accuracy of our DCFTP-SSA algorithm when an analytical expression for the stationary 
distribution is not known. 

In order to sample from the stationary solution of Eq. [TJ one would have to run the SSA for an 
'infinite' time. Of course, the most common practical solution is to run the algorithm 'repeatedly' 
for a 'very long time' and hope that the system has reached the stationary distribution when the 
run is stopped (|20l ). The lack of a termination certificate can lead to computational inefficiency 
if the runs are longer than necessary or, more importantly, to mis-sampling from the wrong 
distribution if the runs are not long enough. 

In this paper we present an algorithm (DCFTP-SSA) that guarantees perfect sampling of 
the stationary solution of the CME for the general class of biochemical networks formed by the 
interconnection of generalized Hill- and Monod-type reactions, the canonical models for GRNs 
and enzymatic networks. The DCFTP-SSA builds on the standard SSA and considers it in the 
light of Markov Chain Monte Carlo (MCMC) algorithms, specifically within the Dominated Cou- 
pling From The Past (DCFTP) framework introduced by Propp and Wilson (|38T ) and Kendall 
(1391 ). Henceforth, we introduce the algorithm and its properties in detail through the analysis of 
the building blocks of GRNs. We then apply the DCFTP-SSA to the study of two systems of 
experimental interest in synthetic biology: the toggle switch (0) and the repressilator 



The Dominated Coupling From The Past — 
Stochastic Simulation Algorithm (DCFTP-SSA) 

In many applications we are interested in sampling from a complicated distribution that cannot 
be written down explicitly. Markov Chain Monte Carlo (MCMC) methods can sometimes provide 
an answer by setting up a Markov chain which has the desired distribution as its stationary 
distribution. Sampling from the target distribution is then achieved by evolving the Markov chain 



until it has reached its stationary distribution (|37t ). The major issue in these schemes is when to 



stop the Markov chain. This problem was addressed by Propp and Wilson with their Coupling 
From The Past (CFTP) algorithm®, a celebrated example of what are commonly referred to 
as Perfect Sampling algorithms (|40l ). The version used here is the Dominated Coupling From 
The Past (DCFTP) algorithm, the extension introduced by Kendall ((391 ) to study continuous-time 
Markov processes on an unbounded state space. We now give a brief introduction to the algorithm. 

In principle, a Markov chain would have to be run for an infinite time in order to reach its 
stationary distribution. The CFTP algorithm, however, recasts the problem as a procedure in 
which the running time becomes a random variable but a certificate is issued if the stationary 
distribution has been reached. This is signalled by the coalescence of relevant coupled Markov 
chains. Markov chains are coupled if they have the same transition rules and use the same sequence 
of random numbers for their realization. Coupled Markov chains are said to coalesce when they 
meet. Clearly, coupled chains with the same transition rule (but started from different initial 
states) will have the same values for all future times after the coalescence time T c . Propp and 
Wilson proposed to use Markov chains coupled from the past to ensure that the whole state space 
of initial conditions maps to the same state at present. By extending sufficiently far back into the 
past and using a fixed sequence of random numbers, we will eventually reach a time from which all 
paths map to the same state at t = 0. This condition is equivalent to the stationary distribution, 
since the starting condition is irrelevant for the current state. 

In many systems, the state space is very large, making it infeasible to monitor all paths and 
their coalescence. However, the situation is much simpler if the state space is partially ordered and 
the partial ordering is maintained by the transition rules. This is the case if we have a monotone 
transition rule <fr, such that 4>{x,R) < <f>{y,R) if x -< y , where < denotes the partial ordering, 
x and y are states and R is a random number used to determine the transition. The algorithm 



also works for anti-monotone transition rules: <fi(x,R) > 4>(y,R) if x ■< y (|4ll ). For a state space 
to be partially ordered, it must be reflexive, antisymmetric and transitive. If the monotonicity 
and partial ordering hold, we only need to monitor two paths: an upper path U and a lower 



Perfect Sampling of the Master Equation 



3 



path L, since all other paths lie in between and will coalesce when U and L have done so. This 
'sandwiching' property is illustrated in Fig. [T] The guaranteed sampling of the CFTP scheme 
comes with a trade-off: the run time is unbounded, since T c is itself a random variable, but it is 
finite almost surely. If the run is prematurely terminated by an impatient user, the sample will 
be biased. 

If the system has an unbounded state space, we must instead employ Kendall's Dominated 
CFTP algorithm (39). The DCFTP relies on finding a reversible dominating process D which 
bounds the original process from above and for which the stationary distribution is known. To 
create a dominating process, we exploit the reversibility and stationarity of D: start the chain D 
at t = from the stationary distribution and evolve it until t = T; then use £)_ t = D t as the 
dominating process on the interval [-T, 0]. It then follows that all chains of the original dominated 
process started at t = — oo will be less than or equal to D at time — T and, consequently, chains 
coupled to D started from a state U_t ^ D—t can be interpreted as random realizations of the 
original (dominated) process started from t = — oo. 

Gillespie's SSA implements a continuous time Markov process. As such, it can be reformulated 
in the CFTP framework if the state space is partially ordered and the propensity functions are 
(anti-)monotone. Indeed, one can show that a partial ordering based on the number of molecules in 
each species is maintained for systems of unimolecular reactions with (anti-)monotone propensity 
functions, like those generated by Hill or Monod birth rates (Hemberg and Barahona, unpublished). 
If such a system has a fixed (or bounded) number of molecules, the upper path is trivial and the 
standard CFTP can be directly applied to the SSA. However, in most situations the number of 
molecules is unbounded and we must use the DCFTP algorithm (f30h . 

Based on the discussion above, a perfect sampling DCFTP-SSA can be developed if two pre- 
requisites are fulfilled. Firstly, the Markov process defined by the dominating process must be 
reversible. This means that for every reaction in the system, there must exist another reaction 
with products and reactants exchanged. Note that this is different to requiring that the underlying 
reactions be reversible and it typically means that for every reaction creating a molecule, there 
must exist a degradation reaction. Secondly, we must find a dominating process for the particular 
CME with a known stationary distribution. We will show below that this requirement can be 
fulfilled for genetic and enzymatic networks formed by the interconnection of Hill-type, Monod- 
type and linear enzymatic unimolecular reactions. This is based on the fact that the CME of 
networks of Hill-type or Monod-type elements can be bounded from above by processes based 
on networks of first order reactions, for which the stationary distribution is known to be the 
multivariate Poisson (fH ). 

A brief outline of the DCFTP-SSA is as follows: 



1. Run the dominating process D with known stationary distribution forward in 
time from t = until t — T. 

2. Apply time-reversal to the stationary process D to obtain the dominating pro- 
cess D such that D t = D-t from t — —T to t = 0. 

3. Start upper (U) and lower (L) chains of the original process starting from 
U-t = D-t and L-t = and update each chain coupled to D until t = 0. 

4. If the chains have coalesced at t = 0, the common value Uq — Lq is a sample 
from the target distribution P*. 

5. If coalescence has not occurred, double the running time to t — 2T reusing the 
random numbers from the previous iteration and repeat. 

6. Keep doubling the running time until coalescence has occurred at t = 0. 

The main feature of the DCFTP-SSA is that it provides a certificate for correct sampling 
from the stationary distribution. Therefore we can use the DCFTP-SSA to numerically sample 
the stationary distribution of the system with guaranteed Monte Carlo accuracy. This can be of 
importance in high-dimensional systems with complicated landscapes where the SSA simulations 
can converge slowly. In the first two sections below, we test the application of the DCFTP-SSA 
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Figure 1: An illustrative run of the DCFTP-SSA for the Hill equation. The top panel 
shows the algorithm started from t = — 1. Both the upper (U) and lower (L) are coupled to the 
dominating path (D) and here they fail to coalesce before t = 0. The coupling implies that U and 
L can only change states if D does so. In the lower panel, the algorithm is restarted from t = —2. 
Note that D extends further into the past but the realization of D in the interval t € [—1, 0] is the 
same as above. In this case, coalescence occurs at T c = —0.17 and the value Uq = Lq is guaranteed 
to be a sample from the stationary solution of the Hill CME (Eq. [TU]) . 



to the sampling of stationary distributions. In those cases we only use the final value of each run 
as a sample, thus ensuring that the samples are independent. 

In addition, we can use the algorithm to discard the 'burn-in' period in stochastic simulations 
by running the DCFTP-SSA until a guaranteed sample from the stationary distribution is ob- 
tained and then using it to initiate a run of the ordinary SSA from time t = onwards. This is 
important for the numerical study of stationary properties of systems with high variability, e.g., 
with underlying oscillatory or excitatory behaviour. In the final section, we present examples of 
these applications to the toggle switch and the repressilator. 



Detailed application to the first order reaction 

In order to illustrate the DCFTP-SSA, we study in detail the simple one-dimensional first order 
reaction, for which a full time- dependent analytical solution of the CME is available. This allows 
us to study the convergence of the algorithm, as compared to the standard SSA, and to explore the 
difference between two sources of error which often get entangled: finite sampling error and the 
mis-sampling error due to the fact that we may not have reached the true stationary distribution. 

Consider the simple first order reaction, which has been used as a very simplified description 
of transcription and translation f]~8h : 

=^ i = fc-i, (3) 

where species A is created at a (normalized) constant rate k from a source (0) and degraded to a 
sink (0). The corresponding CME is 

Pj = kPi-x - kP 3 + (j + l)P j+1 - jP 6 = (E- 1 - l)kPj + (E - l)jPj , (4) 

with Pj denoting the probability of having j molecules of A. E and E _1 are the step operators 
defined by van Kampen (fl3h : E/(j) = f(j + 1) and E _1 /0') = f(j — 1) for a function f(j). 



on[3] is 

(13, 42 



is one of a few CMEs for which the analytical expression of the stationary solution 
421 ): it is the Poisson distribution with parameter A = k. More importantly for 



Equation 
is known 

our purposes, one can obtain the full time- dependent solution of the CME (see Appendix EJ. For 
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Figure 2: Comparison of the convergence of the DCFTP-SSA vs. the standard SSA 
for the first order reaction (Eq. [4]) with k = 5. We present the Euclidean error of the 
distribution as a function of the number of Monte Carlo runs N for the DCFTP-SSA (*) and for 
the standard SSA with stopping times: tssA — 1(V), 2(0), 3(D), 15(0)- Note that the error of 
the SSA runs converges to the asymptotic error floors for the different stopping times given by 
Eq. [6] (dotted lines). The dashed line corresponds to the N^ 1 / 2 Monte Carlo scaling and shows 
the correct convergence of the DCFTP-SSA (and the SSA only when the stopping time is with 
very long). Each point is calculated from the mean of 100 different ensembles of N samples, with 
numerical error bars smaller than the symbols. Inset: The full time-dependent solution of the first 
order reaction (Eq. [5]) from a ^-distribution initial condition. The distribution evolves towards a 
smoother Poisson distribution. The dark lines correspond to the distributions at t = 1,2,3 used 
in the main panel. 



the usual initial condition with molecules, the time-dependent distribution P(t) is given by (see 
Fig. El inset): 

Pj (t) P(j, t|0, 0) = eXpH ^~ e " )} [*(1 - e"*)] j , (5) 

which converges to the correct stationary distribution as t — > oo. 

The mis-sampling error can be understood analytically in this simple example. If the standard 
SSA is used to estimate the stationary solution of the CME (Eq. [4]) , N samples from indepen- 
dent SSA runs will be collected at a stopping time issA- This leads to the sampled distribu- 
tion Pssa(-/V, £ssa)> which clearly converges to the stationary distribution P* in the double limit 
^SSA, N — > oo. As N is increased, with fixed isSA, we sample with increasing Monte Carlo accuracy 
from P(£ssa) given by Eq. [5]but not from the true stationary distribution P*. We therefore reach 
an error floor that cannot be broken. Using the analytical expression (Eq. [SJ, the asymptotic error 
floor e|;(isSA) is shown to be 

£w-w=£(^-^) 2 

3=0 j=0 V J ' J ' / 

I {2k)e~ 2k - 21 {2ky/a)e- k - a + I a (2ka) e - 2a , (6) 

where a = 1 — e~' SSA and Io{x) is the modified Bessel function of the first kind. Fig. [5] shows that 
the levelling-off of the Euclidean error for PggA(^V, £ssa) is explained by the error floors calculated 
from Eq. [5] 

This source of error is eliminated in a DCFTP-SSA formulation of this process. The key step is 
to find a reversible dominating process with known stationary distribution. In this example, it is 
clear that the best choice is to use the process (Eq.UJ) itself, i.e., the U and D chains in Fig. [T] will 



<4(issA) 2 = 
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be identical. Figure [5] shows the Euclidean error for the DCFTP-SSA sample distribution, which 
shows no flooring and the expected N^ 1 / 2 scaling with the number of Monte Carlo runs (43l). 
Equivalent results are obtained with other error measures for the sampled distribution, e.g., the 
Kullback-Leibler and Kolmogorov distances, and the x 2 -goodness of fit. 



The building blocks: networks and nonlinear elements 

In order to extend our study to GRNs, we need to consider how to deal with two key ingredients: 
networks of reactions, and nonlinear elements of the Hill and Monod types. 



Application to networks: Simple gene model of two coupled first order 
reactions 

To showcase how to apply the DCFTP-SSA to networks of several species, we use a widely studied, 



simplified model of gene expression (|15l . 1171 . Il8f ). In its simplest form, the model consists of two 
types of molecules: mRNA (M) and proteins (R), which are produced and depleted at constant 
rates. In addition, the mRNA catalyzes protein production through a linear enzymatic reaction: 




The deterministic description is given by 

M = k M - M 

R = k B + k R M - R, (7) 

and the corresponding CME is 

Pm,r = (Ea/ " l)k M PM,R + (Em - l)MP M , fl 

+ (E^ 1 -l)(k B + k R M)P M ,R + {E R -l)RP MiR . (8) 

In fact, Eq. [S]has been solved: the stationary state of the CME of any general network of linear 
reactions is a multivariate Poisson distribution (fl7l |42|). This means that, similarly to the one- 
dimensional first order reaction, the DCFTP-SSA is directly applicable to Eq. [8l since the reaction 
network is reversible and the multivariate Poisson stationary solution is itself a dominating process 
for the system. The existence of an analytical solution allows us to check how the accuracy of 
the DCFTP-SSA depends on the dimensionality of the system. Fig. [3] (inset) shows that the 
marginal distribution (for M) converges like the one dimensional reaction in Fig. [21 with similar 
error floors for the standard SSA. Note, however, that the joint distribution (Fig. [3]) exhibits slower 
convergence of the standard SSA due to the higher dimensionality of the system, indicating the 
need for longer SSA runs to avoid mis-sampling from the true distribution. The DCFTP-SSA 
shows N~ x / 2 Monte Carlo scaling with no error floors regardless of the dimensionality of the 
system. 



Non-linear elements: Hill and Monod reactions 

The Hill reaction is widely used to model the sigmoidal (non-linear) characteristics of many bi- 
ological processes and specifically those involved in genetic regulation (Q, 0, S3). This model 
incorporates negative feedback, whereby the rate of creation of new molecules decreases as their 
concentration increases: 

=> A = -^—-A, (9) 
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Figure 3: The effect of dimensionality on the convergence of the sampled distribution. 
The main panel shows the Euclidean error for the joint distribution of the linear gene 
model (Eq. [8]) with &m = 3, kn = 3 and ks = 1. The open symbols correspond to the standard 
SSA with stopping times as in Fig. [2] while the DCFTP-SSA is marked by asterisks. The points 
are obtained by averaging over 100 ensembles of iV samples, with error bars contained within the 
symbols. The higher dimensionality and the topology of the network make the convergence of the 
standard SSA much slower for this case than for the one-dimensional first order reaction in Fig. [51 
Note that the DCFTP-SSA is unaffected by error floors and shows N^ 1 / 2 convergence, as given 
by the dashed line. Inset: Same as the main panel for the marginal distribution of the mRNA 
(M). In this case, the convergence is as for the one-dimensional linear reaction in Fig. O The 
difference between the convergence of the joint and marginal distributions is not surprising: the 
production of R is downstream from M and it is unlikely to reach the stationary level before M. 



where f(A; k, a) = k / (8 a + A a ) is the state-dependent reaction rate, k is the renormalized reaction 
constant and a is the cooperativity factor (fijl). Eqjpcan be derived from elementary Law of Mass 
Action kinetics through elimination of variables f| 16T ) . The corresponding CME is given by: 

Pj = (E- 1 - 1)^-7^ + (E- l)jPj. (10) 

It is usual to fix 6 = 1 and we do so in the following. For the particular case a = 1, Eq. [5] 
reduces to the familiar Michaelis-Menten equation, for which we can obtain the following analytical 
expression for the stationary distribution (see Appendix [Bjl : P* = cfe- 7 /(j!) 2 , where c = 1/Jo(2Vk) 
is a normalization constant. 

The Monod reaction is used to mo del g ene upregulation, as referred in particular to the auto- 
catalysis common in many eucaryotes ( 10L l44l 1461) . The reaction can be written as 

0^4i0 =► A=-^ r -A, (11) 

6 a + A a 

where g(A;k,a) = kA a /(9 a + A a ) is the state-dependent reaction rate that encapsulates the 
positive feedback. The corresponding CME is given by 

Pj = (E- 1 - ^^-T^ + ( E - 1 )i P i" ( 12 ) 

Again, we fix = 1 in the following. The stationary distribution of the Monod CME fEq. [T2")) with 
a = 1 can be obtained analytically (see Appendix IB"]) : P* = ck J /jl, with normalization constant 
c= l/(e fc + fc-l). 
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For general a, no explicit solution for the stationary distribution of the CMEs Eq. [TO] and 
Eq. [TO] is known, and numerical methods are therefore necessary. Fortunately, the DCFTP-SSA 
is directly applicable to the reversible Hill and Monod CMEs: the first order reaction, studied in 
detail in the previous Section, can be used to produce a dominating process for both processes. 
To see this, note that Eq. [TO] and Eq. [TO] have birth rates that are bounded from above by k at 
all times. It is thus straightforward to make sure that the upper and lower paths have rates that 
fulfill the requirements of the algorithm. As an illustration, Fig. Q]shows one DCFTP-SSA run for 
the Hill CME (Eq. [TO]), where D corresponds to the dominating first order linear process (Eq. [4} 
and U and L are the coupled Hill processes (|39l . [40h . Our detailed simulations of the Hill and 
Monod CMEs (data not shown) show N^ 1 / 2 convergence of the DCFTP-SSA unaffected by the 
mis-sampling errors that can appear when using the standard SSA with short stopping times. 



Application to nonlinear models of Gene Regulatory Net- 
works 

Our foregoing discussion clarifies why the DCFTP-SSA is applicable to reversible GRNs consist- 
ing of interconnected linear enzymatic, Hill and Monod birth reactions: it is possible to define 
a dominating process for these networks based on the associated linear network for which the 
stationary solution (a multivariate Poisson distribution) is known. We now apply our scheme to 
two recent, canonical examples of synthetic GRNs: a bistable genetic toggle switch |5|), with a 
bimodal stationary distribution; and the repressilator (|4j), a simple synthetic oscillator. 



Toggle switch: two coupled Hill equations 

An important characteristic of biochemical networks is the possibility of multistability |H, @, 

3EF 



161 . Il8l . 12(1 1461 . 1471 ). as exemplified by the tog gle switch designed by Gardner et al by combining 
two mutually repressing genes (@, [ll , 14 , lg, 13) ■ This leads to a bistable system with a sharp 
switching threshold: 



f(B-k A ,a) l 

■A 



B T" 

f(A;k B ,l3) 1 



A deterministic model can be derived using two coupled Hill-equations: 

A = — — A 

1 + B a 

B = TT^- S ' (13) 

where A and B are the molecular numbers of the transcription factors. With appropriate cooper- 
ativity factors a, j3 > 1 and reaction constants Ua and ks, Eq. [TO] has two stable fixed points (jH). 
The corresponding CME is given by 

Pa,b = (E^ 1 -1) tt ^P a ,b + (E a -1)AF^ b 

+ (E b 1 -1) t |^P 4 .b + (IEb-1)BPa,b. (14) 

Based on our discussion above, it is easy to see that linear process with constant birth rates &u 
and ks will be a dominating process for Eq. 1141 Since the Hill-function is anti-monotone, we must 
use a cross-over scheme for the updates to make sure that all initial conditions are sandwiched 
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Figure 4: Stationary distribution for the toggle switch (Eq. I14p with parameters 1$a = 

30, fcs = 10, a = 3, j3 = 1. The bimodality of the distribution leads to switching behavior. We 
have checked that the Euclidean error of the DCFTP-SSA sampled distribution decreases at the 
expected N~ x / 2 rate (data not shown). The state space can be divided in two regions and B# 
corresponding to the two basins of attraction of the fixed points. The separatrix is found from the 
sign of the Fiedler eigenvector of the Laplacian (fljl ) of the state space lattice. The probabilities 
P A # and P B # of finding the system in each basin computed with the DCFTP-SSA compare very 
well with those obtained with the approximate eigenvector method (in parenthesis) . Similarly, the 
expected time to reach the separatrix averaged over all the states in each basin (j37l ) calculated 
with the DCFTP-SSA (averaged over 25 ensembles, each consisting of 1000 samples) compare well 
with the approximate eigenvector method (in parenthesis). 



between the upper and lower paths, as described in Haggstrom and Nelander (|4lh. We have 
applied the DCFTP-SSA to Eq. [TJ] to obtain the stationary distribution of the system, which is 
clearly bimodal (Fig. |4j). Because there is no analytical solution in this case, we have checked the 
accuracy and convergence of the DCFTP-SSA against the approximate eigenvector method with 
excellent results (not shown). 

Beyond certifying proper convergence, the DCFTP-SSA can be used to study properties of the 
network that need to be properly averaged over the stationary distribution. Consider the mean 
time for the system to switch from the basin of one fixed point A# to the other i.e., the 
escape time of the CME (Eq. [T4|) . If the standard SSA were to be used, it would be unclear how 
to collect proper Monte Carlo statistics, since no certificate of stationarity exists. In contrast, we 
use a mixed scheme in which an initial DCFTP-SSA run is followed by a (faster) SSA run. The 
initial DCFTP-SSA run ensures that the initial condition for the SSA run is correctly sampled 
from the stationary distribution at t — 0, in essence eliminating the 'burn-in' period. From that 
point onwards, a standard SSA run is performed to obtain statistics of the average time to cross 
the separatrix. The results are reported in Fig. |U 

In their experiments, Gardner et al. jH) were able to induce faster A# — > B# switching by 
reducing the downregulation capability of A in the cellular environment. This can be mimicked 
by a modified CME in which the A^-term in Eq.[TJ]is removed. If we run the SSA of this modified 
CME starting from the stationary distribution of the original system (Eq. [14)) , the escape time 
t a#-*b# becomes approximately two orders of magnitude smaller, in broad agreement with the 
experiment. In fact, it can be shown that the stationary distribution of the modified CME becomes 
unimodal, with no observable peak for species A. 
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Repressilator: six-dimensional system of linear enzymatic and Hill equa- 
tions 

The repressilator, a synthetic system of transcriptional regulators, is perhaps the simplest bio- 
chemical oscillator that has been implemented experimentally Q). It consists of three genes in a 
loop, in which the expression of one gene is inhibited by the product of another gene in succession 
(Fig. [5b): 

Mi = — — d M Mi 

R t = k B + k n Mi-Ri, i = 0,1,2 (mod 3). (15) 

Here Mi are the mRNA levels (with production rate /cm and degradation rate <1m) and Ri are 
the corresponding proteins (with basal rate ks and linear production rate fcfi). This model of the 
repressilator is therefore a network of linear enzymatic elements (Eq. [8]) for the protein produc- 
tion terms and Hill reactions (Eq. ITU]) to modulate the production of mRNA. The deterministic 
system (Eq. [TSJ has been shown to be oscillatory, with the concentrations of the three proteins 
peaking in succession (Fig. [5b, top panel). 
The corresponding CME is given by 

P n = ^(E^-l)— -g— P n + ^ Mi -l)dMMiP n 
i=o U + ^+iJ 

+ (E" 1 - l)(k B + k R Mi)P n + (E Ri - l)RiP n , (16) 

where the shorthand P n denotes the state Pm ,m 1 ,m 2 ,R ,Ri,R2 an d i mod 3. We have simulated 
this six dimensional system using the cross-over scheme described above: we use the DCFTP-SSA 
to discard the 'burn-in period', i.e., the time it takes for the Markov process to reach the stationary 
distribution, such that the initial conditions for the SSA runs are sampled from the stationary 
distribution. As a measure of the stationarity of our DCFTP-SSA initialized sampling, we have 
checked its ergodicity by establishing through an F-test that the average period and amplitude 
of several short simulations are statistically indistiguishible from those obtained from one long 
simulation. 

As can be seen in the sample trajectory in Fig. [5b, the stochastic system is extremely noisy 
but it retains the overall oscillation of the genes in succession. The oscillations do not die out 
even for very long simulations and we have collected statistics of such long runs. Fig. [5]c shows 
a robust feature of the oscillator: the distribution of the period is well fitted by a generalized 
Gamma distribution, which is characteristic of excitatory systems with a refractory period. The 
oscillatory behaviour is present for a wide range of the parameters fcj\/, k R , cLm (49), which could 
be modified experimentally. Fig. Oi summarizes our investigation of the robustness of the repres- 
silator to parametric variation. Note the good agreement of the calculated mean period with the 
corresponding deterministic values. The simulations show that changes in parameters ku and k R 
produce a similar, small effect on the period. The system is, however, more sensitive to parametric 
changes in dm- The evaluation of how these parametric variations could be used for the design of 
tunable and reliable oscillators will be the object of further study 

Discussion 

In recent years, the importance of stochastic effects in gene regulation has been elegantly demon- 
strated through experiments. ^From a theoretical perspective, such systems are usually analyzed 
numerically with the standard SSA. When studying stationary properties, this raises questions 
about the sampling of the stochastic process since the SSA does not provide explicit guarantees 
for convergence. One issue of interest for the use of the DCFTP-SSA is that the shape of the 
stationary distribution can be more important than the dimensionality of the system for the con- 
vergence of the stochastic simulations. For systems with smooth unimodal distributions, such as 
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x k* 

Figure 5: Analysis of the repressilator with ku = 10, cIm = 1, a = 2, fog = 1, kp = 3. 

(a) Diagrammatic representation of the repressilator network showing the alternating positive 
and negative feedbacks, (b) Top: One SSA realization of the CME (Eq. [To]) started from the 
stationary distribution showing the concentrations of the three proteins. The oscillations are very 
noisy and far from sinusoidal. The Fourier spectrum shows no distinguishable peaks (data not 
shown) . Bottom: The corresponding deterministic solution (Eq. 1 1 5j) exhibits regular oscillations 
of the proteins in succession, (c) Distribution of the period r at stationarity obtained from a long 
SSA run. The solid line corresponds to a best fit to a generalized Gamma distribution while the 
dotted line shows a Poisson distribution with the same mean, (d) Change of the period of the 
repressilator as a function of parameter variation. There is good agreement between the sensitivity 
of the stochastic system with respect to a ±5% change in ku (*), kp (□), and oIm (0)> an d the 
deterministic system (solid lines) . The effect of oIm on the duration of the period r is greater and 
opposite to that of k^i and kp, which are similar. 
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the repressilator, the SSA approaches the stationary distribution rapidly, regardless of the initial 
conditions. However, the ordinary SSA can be highly sensitive to initial conditions for systems 
with multimodal distributions, such as the toggle switch. In general, the SSA will converge rapidly 
to the nearest mode. If we are primarily interested in escape times, they can be estimated ac- 
curately by using the ordinary SSA started close to one of the modes. However, if the escape 
times from the modes are long and the ordinary SSA is used, we run the risk of mis-sampling the 
stationary distribution, specifically in important areas of low probability such as the separatrix. 
The DCFTP-SSA circumvents this problem but at the cost of longer coalescence times. 

The guaranteed sampling provided by the DCFTP-SSA comes at an extra computational cost. 
In general, the coalescence time will depend on the topology of the network and the form of the 
reactions. However, the CPU overhead is in no way prohibitive and the DCFTP-SSA can be run on 
an ordinary desktop computer for the systems presented in this paper. An important theoretical 
feature of the DCFTP-SSA is the fact that its runtime is not bounded. As our simulations 
show, this feature does not seem to impose practical restrictions on the algorithm. If necessary, 
however, this issue can be resolved by using FMMR (|5 lh . another perfect sampling algorithm, 
which dispenses with variable stopping times by running the Markov chains for a fixed time 
and using rejection sampling to discard those simulations that have not reached the stationary 
distribution. 

In its current form, the DCFTP-SSA can be applied to reversible systems of linear and nonlinear 
(anti-)monotonic unimolecular reactions for which there exists a dominating process with known 
stationary distribution. We note that the algorithm is applicable not only to networks of linear 
enzymatic, Hill and Monod birth reactions of standard form, but also to reactions whose rate 
equation is a rational function that can be dominated by a linear process. This condition is 
equivalent to requiring that the polynomial in the denominator of the propensity function have 
a higher degree than the polynomial in the numerator. These types of rational functions are 
frequently obtained as compound unimolecular rate laws to represent more complex enzymatic (fiih 
or gene regulatory reactions (|52l ). An example of such functions is the model for the A-phage lysis- 
lysogeny switch (fill . [lib . which we have also simulated with the DCFTP-SSA (results not shown). 
Bimolecular and two-substrate enzymatic reactions are not included in this group and extending 
the DCFTP or CFTP to provide perfect sampling for the generic SSA of second order reactions 
will entail further research. 

The DCFTP-SSA introduced here can be viewed as a reformulation and extension of the Gille- 
spie algorithm that provides guaranteed sampling from stationarity for a generic class of systems 
relevant in GRNs, thereby removing a source of uncontrolled uncertainty from the simulations. 
By eliminating these extraneous sources of error, comparisons between the predictions of different 
stochastic models at stationarity can be performed more meaningfully In the case of the repres- 
silator, we provided an accurate numerical characterization of the stationary distribution for the 
period of this stochastic oscillator. Because stationarity is guaranteed, the characteristics of this 
distribution can be correlated with changes in the parameter values (as shown in Figure EH) or 
compared with those of other genetic oscillators. This could be a useful tool for the analysis and 
design of synthetic transcriptional oscillators and will be the subject of future research. 
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A Time-dependent solution of the Master Equation for the 
one-dimensional first-order reaction 

The CME for the one-dimensional first order reaction (Eq. [4| has the same form as a homogenous birth- 
death process on the non-negative integers. It is also identical to the equation one obtains when studying 
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the length of a M/M/oo-queue with Poisson arrival and exponential service time (|37f ). Associated with 
Eq . [4[ t here exists a family of orthogonal polynomials {ipj(x)}Jt Q with the three term recurrence rela- 
tion (|53l ): 

-xtpj(x) = jipj-!^) ~ (k + j)ifj(x) + kip j+1 (x), (17) 

where <fio(x) = 1 and fj(x) = for j < 0. Karlin and McGregor showed that the recurrence rela- 
tion (Eg. I17[l leads to a spectral representation for Pji(t), the transition probability of going from state i 
to state j in time t: 

f-OC 

Pji{t)=Pj e- xt <pi{x)ip j (x)d4>{x), (18) 
Jo 

where 4>(x) is a positive measure on x and P* is the stationary distribution. 

The recurrence relation (Eq. I17[) is satisfied by the Charlier polynomials with <f>(x) equal to the Poisson 
distribution |H3) . i.e., the Charlier polynomials are a family of polynomials which are orthogonal with 
respect to the Poisson measure on a discrete lattice on the non-negative integers |H3) . The Charlier 
polynomial of order i, parameter a and argument x is denoted C; (x\ a) . 

Using the following bilinear generating form of the Charlier polynomials fSB]) 

jr^a (m; x) Ci (n;v)=e*(l- Z -) m (l- Z 7 ) n C m (n; {x Z*L \ (i 9 ) 

i=0 



fe! \ a;/ y y 

Lee (|56|) showed that Eq. [18] can be simplified to give: 

exp(-fc(l-e" t )) _ -t\i^f- *(l-e"*) a l 

i > i»(t) = Ti i-[fc(l-e )J (1-e ) C, j; — . (20) 



The important special case with initial condition given by a (^-distribution at the origin can be simplified 
even further. If Pj(0) = 5(0), then Co(j;a) = 1 and Eq. [201 becomes 

P(j,t\0, 0) = eXP( ' fc ^ e ' t)} [*(1 - e~*)] 1 = m, (21) 

which is a Poisson distribution with time- varying parameter k(t) = fc(l — e~'). Therefore, the stationary 
Poisson distribution P* — fe J e _fc /j! is approached exponentially fast. 
Note that 

lim P^t) = Z-JL, (22) 

which is another way of showing that the stationary distribution the process is Poisson no matter what 
the initial condition is. 



B Stationary solution of the Hill and Monod CMEs with 

a = 1 

The stationary solution for the Hill CME (Eq. I10[) with cooperativity factor a = 1, which is equivalent to 
the Michaelis-Menten reaction, can be obtained analytically. This reaction can be viewed as a non-linear 
birth-death process, which allows us to write the stationary distribution P* as (|37h 

= c A A 1 ---A j - 1 
/ii/i 2 ■■•m 

where Xi and fli are the birth and death rates of state i and c is a normalization constant. Inserting the 
values from the CME (Eq. ITOjl we obtain 

p * _ i lflTT-' - e+jpr _ i fc J r(6> + l) 

with normalization constant _ _ _ 

_ 9I e (2Vk) + VkI 1+e {2Vk) 

where 7e(x) are Bessel functions. For the special case of 9 — 1, we use a recurrence relation for Bessel 
functions to simplify the above result to Cq 1 = /o(V2fe). 
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Consider now the Monod CME (12) with a = 1: 

(E- 1 -l)^-P 3 + (E-l)jP J if j > 1 
k if j = 0. 

Note that the birth rate of the state j — is modified to avoid it becoming absorbing. Using the same 
strategy as for the Michaelis-Menten CME above, one can find the stationary distribution 

k _k fc(j-l) 



9+t"'9+ft _ fc 3 r(e + i) 

JT(J + 9) 



P * = c ^ = c , :;;; i 1 ;^ , (25) 



with normalization constant c = k (1 + i-Fi(2, 1 + 6, k)), where 1F1 is Gauss' hypergeometric function. 
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